Edit Distance Algorithm
The Edit Distance Algorithm, also known as the Levenshtein distance, is a measure of the similarity between two strings, which is calculated as the minimum number of single-character edits required to transform one string into another. These single-character edits may include insertions, deletions, or substitutions of characters. The algorithm has widespread applications in various domains, such as spell checking, natural language processing, DNA sequence alignment, and data entry validation.
The algorithm works by constructing a matrix with the two strings as row and column labels, and then filling it in using dynamic programming. The value in each cell of the matrix represents the minimum edit distance between the prefixes of the strings up to that point. The process involves initializing the first row and column with increasing integer values, starting from 0, and then iteratively filling in the rest of the matrix based on the minimum of the neighboring cells (top, left, and top-left) with a possible additional cost of 1 for substitution. The final edit distance is obtained from the bottom-right cell of the matrix, which represents the minimum number of edits required to transform the entire first string into the second one. The algorithm's time complexity is O(mn), where m and n are the lengths of the input strings, making it efficient for comparing strings of moderate length.
/* Given two strings str1 & str2
* and below operations that can
* be performed on str1. Find
* minimum number of edits
* (operations) required to convert
* 'str1' into 'str2'/
* a. Insert
* b. Remove
* c. Replace
* All of the above operations are
* of equal cost
*/
#include <iostream>
#include <string>
using namespace std;
int min(int x, int y, int z)
{
return min(min(x, y), z);
}
/* A Naive recursive C++ program to find
* minimum number of operations to convert
* str1 to str2.
* O(3^m)
*/
int editDist(string str1, string str2, int m, int n)
{
if (m == 0)
return n;
if (n == 0)
return m;
//If last characters are same then continue
//for the rest of them.
if (str1[m - 1] == str2[n - 1])
return editDist(str1, str2, m - 1, n - 1);
//If last not same, then 3 possibilities
//a.Insert b.Remove c. Replace
//Get min of three and continue for rest.
return 1 + min(editDist(str1, str2, m, n - 1),
editDist(str1, str2, m - 1, n),
editDist(str1, str2, m - 1, n - 1));
}
/* A DP based program
* O(m x n)
*/
int editDistDP(string str1, string str2, int m, int n)
{
//Create Table for SubProblems
int dp[m + 1][n + 1];
//Fill d[][] in bottom up manner
for (int i = 0; i <= m; i++)
{
for (int j = 0; j <= n; j++)
{
//If str1 empty. Then add all of str2
if (i == 0)
dp[i][j] = j;
//If str2 empty. Then add all of str1
else if (j == 0)
dp[i][j] = i;
//If character same. Recur for remaining
else if (str1[i - 1] == str2[j - 1])
dp[i][j] = dp[i - 1][j - 1];
else
dp[i][j] = 1 + min(dp[i][j - 1], //Insert
dp[i - 1][j], //Remove
dp[i - 1][j - 1] //Replace
);
}
}
return dp[m][n];
}
int main()
{
string str1 = "sunday";
string str2 = "saturday";
cout << editDist(str1, str2, str1.length(), str2.length()) << endl;
cout << editDistDP(str1, str2, str1.length(), str2.length()) << endl;
return 0;
}